40 research outputs found

    SRL4ORL: Improving Opinion Role Labeling using Multi-task Learning with Semantic Role Labeling

    Full text link
    For over a decade, machine learning has been used to extract opinion-holder-target structures from text to answer the question "Who expressed what kind of sentiment towards what?". Recent neural approaches do not outperform the state-of-the-art feature-based models for Opinion Role Labeling (ORL). We suspect this is due to the scarcity of labeled training data and address this issue using different multi-task learning (MTL) techniques with a related task which has substantially more data, i.e. Semantic Role Labeling (SRL). We show that two MTL models improve significantly over the single-task model for labeling of both holders and targets, on the development and the test sets. We found that the vanilla MTL model which makes predictions using only shared ORL and SRL features, performs the best. With deeper analysis we determine what works and what might be done to make further improvements for ORL.Comment: Published in NAACL 201

    A Mention-Ranking Model for Abstract Anaphora Resolution

    Full text link
    Resolving abstract anaphora is an important, but difficult task for text understanding. Yet, with recent advances in representation learning this task becomes a more tangible aim. A central property of abstract anaphora is that it establishes a relation between the anaphor embedded in the anaphoric sentence and its (typically non-nominal) antecedent. We propose a mention-ranking model that learns how abstract anaphors relate to their antecedents with an LSTM-Siamese Net. We overcome the lack of training data by generating artificial anaphoric sentence--antecedent pairs. Our model outperforms state-of-the-art results on shell noun resolution. We also report first benchmark results on an abstract anaphora subset of the ARRAU corpus. This corpus presents a greater challenge due to a mixture of nominal and pronominal anaphors and a greater range of confounders. We found model variants that outperform the baselines for nominal anaphors, without training on individual anaphor data, but still lag behind for pronominal anaphors. Our model selects syntactically plausible candidates and -- if disregarding syntax -- discriminates candidates using deeper features.Comment: In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Copenhagen, Denmar

    How Much Consistency Is Your Accuracy Worth?

    Full text link
    Contrast set consistency is a robustness measurement that evaluates the rate at which a model correctly responds to all instances in a bundle of minimally different examples relying on the same knowledge. To draw additional insights, we propose to complement consistency with relative consistency -- the probability that an equally accurate model would surpass the consistency of the proposed model, given a distribution over possible consistencies. Models with 100% relative consistency have reached a consistency peak for their accuracy. We reflect on prior work that reports consistency in contrast sets and observe that relative consistency can alter the assessment of a model's consistency compared to another. We anticipate that our proposed measurement and insights will influence future studies aiming to promote consistent behavior in models.Comment: BlackboxNLP 2023 accepted paper camera-ready version; 6 pages main, 3 pages appendi

    Latentna semantička analiza, varijante i primjene

    Get PDF
    U danaÅ”nje vrijeme sve viÅ”e težimo tome da omogućimo da računalo izvrÅ”ava zadatke, koje čovjek čini rutinski, jednako brzo i efikasno. Jedan od takvih zadataka je i pronalazak par dokumenata iz kolekcije koji su najrelevantniji za korisnikov upit. Prvi korak u rjeÅ”avanju tog problema je reprezentacija kolekcije dokumenata pojmovno-dokumentnom matricom, čiji elementi predstavljaju tf-idf težine riječi u dokumentu. Na taj način smo svaki dokument prikazali vektorom u prostoru pojmova. Ako i upit prikažemo vektorom, onda za usporedbu upita i dokumenta iz kolekcije, možemo iskoristiti standardne mjere sličnosti, poput kosinusne. U takvom prostoru, sinonimi će biti ortogonalni, a viÅ”eznačnice će biti predstavljene jednim vektorom, neovisno o kontekstu u kojem se riječ nalazi. Motivirani tom činjenicom i velikom dimenzijom pojmovno-dokumentne matrice, odlučili smo ju aproksimirati matricom nižeg ranga. Aproksimaciju je omogućila singularna dekompozicija matrice (SVD). Pokazali smo da aproksimacijom uzimamo u obzir kontekst u kojem se riječ nalazi. Kako bismo korisnikov upit mogli usporediti s vektorima dokumenata u novonastalom prostoru i njega transformiramo. Pokazali smo kako u slučaju dinamičke kolekcije možemo dodati nove dokumente i pojmove u već postojeći latentni prostor. Iako je opisana metoda, koju kraće zovemo LSA, donekle rijeÅ”ila problem sinonima, preostao je problem s viÅ”eznačnicama. Osim toga, LSA pretpostavlja da Å”um uzorka podataka (dobiven zbog jezične varijabilnosti) ima Gaussovu distribuciju, Å”to nije prirodna pretpostavka. Sljedećom metodom, pLSA, pretpostavili smo da svaki dokument dolazi iz nekog generativnog, vjerojatnosnog procesa čije parametre tražimo maksimizacijom izglednosti. Svaki dokument je mjeÅ”avina latentnih koncepata i tražimo posteriorne vjerojatnosti tih koncepata uz dana opažanja. Međutim, pLSA ih shvaća kao parametar modela, Å”to dovodi do prenaučenosti. Zato smo prezentirali joÅ” jedan model, LDA, koji te vjerojatnosti tretira kao distribuciju koja ovisi o nekom parametru. Kao i pLSA, i LDA reprezentira dokumente kao mjeÅ”avinu latentnih tema, ali teme su sada distribucije riječi iz rječnika. Zato je bilo potrebno definirati neku distribuciju distribucija, gdje se prirodno nametnula Diricheltova distribucija. Na kraju smo ukratko prikazali modeliranje tema na kolekciji članaka iz Wikipedije.Nowadays, more and more important is to make a computer that performs tasks that man does routinely, as fast and efficiently. One of these tasks is finding a few documents from the given collection, that are most relevant for userā€™s query. The first step in solving this problem is representing the collection of documents as a term-document matrix, whose elements are tf-idf weights of words in the document. In this way, we represent each document as a vector in the space of terms. If the query is represented as a vector as well, standard similarity measures, such as a cosine similarity, can be used for comparison of the query and documents. In such space, synonyms will be orthogonal and polysemies will be presented with one vector, regardless of the context of the word. Motivated by this fact, and a large dimension of the term-document matrix, a lower rank approximation of the matrix is done. The approximation is gained using a singular value decomposition (SVD) of the matrix. We have shown that the approximation takes into account the context of the words. The query needs to be transformed into a new space as well, so it can be compared with vectors in this lower dimensional space. We showed how can we add new documents and terms in the case of a dynamic collection. While this method, solves the problem of synonyms to some extent, the problem with polysemies remains unsolved. In addition, LSA assumes that the data noise (gained from language variability) has a Gaussian distribution, which is not a natural assumption. The following method, pLSA, assumes that each document comes from a generative, probabilistic process, whose parameters we seek with maximization of likelihood. Each document is a mixture of latent concepts and we look for posterior probabilities of these concepts when observations are given. However, pLSA assumes these probabilities are parameters of model which leads to over-fitting of the model. Therefore, we present another model, LDA, that treats these probabilities as a distribution that depends on some parameter. Documents are, again, represented as a mixture of latent topics, but these topics are a distribution of words from the dictionary. Therefore, it is necessary to define a distribution of distributions and a natural choice is the Dirichelt distribution. Finally, we have briefly presented a topic modeling of the collection of articles from Wikipedia

    Latentna semantička analiza, varijante i primjene

    Get PDF
    U danaÅ”nje vrijeme sve viÅ”e težimo tome da omogućimo da računalo izvrÅ”ava zadatke, koje čovjek čini rutinski, jednako brzo i efikasno. Jedan od takvih zadataka je i pronalazak par dokumenata iz kolekcije koji su najrelevantniji za korisnikov upit. Prvi korak u rjeÅ”avanju tog problema je reprezentacija kolekcije dokumenata pojmovno-dokumentnom matricom, čiji elementi predstavljaju tf-idf težine riječi u dokumentu. Na taj način smo svaki dokument prikazali vektorom u prostoru pojmova. Ako i upit prikažemo vektorom, onda za usporedbu upita i dokumenta iz kolekcije, možemo iskoristiti standardne mjere sličnosti, poput kosinusne. U takvom prostoru, sinonimi će biti ortogonalni, a viÅ”eznačnice će biti predstavljene jednim vektorom, neovisno o kontekstu u kojem se riječ nalazi. Motivirani tom činjenicom i velikom dimenzijom pojmovno-dokumentne matrice, odlučili smo ju aproksimirati matricom nižeg ranga. Aproksimaciju je omogućila singularna dekompozicija matrice (SVD). Pokazali smo da aproksimacijom uzimamo u obzir kontekst u kojem se riječ nalazi. Kako bismo korisnikov upit mogli usporediti s vektorima dokumenata u novonastalom prostoru i njega transformiramo. Pokazali smo kako u slučaju dinamičke kolekcije možemo dodati nove dokumente i pojmove u već postojeći latentni prostor. Iako je opisana metoda, koju kraće zovemo LSA, donekle rijeÅ”ila problem sinonima, preostao je problem s viÅ”eznačnicama. Osim toga, LSA pretpostavlja da Å”um uzorka podataka (dobiven zbog jezične varijabilnosti) ima Gaussovu distribuciju, Å”to nije prirodna pretpostavka. Sljedećom metodom, pLSA, pretpostavili smo da svaki dokument dolazi iz nekog generativnog, vjerojatnosnog procesa čije parametre tražimo maksimizacijom izglednosti. Svaki dokument je mjeÅ”avina latentnih koncepata i tražimo posteriorne vjerojatnosti tih koncepata uz dana opažanja. Međutim, pLSA ih shvaća kao parametar modela, Å”to dovodi do prenaučenosti. Zato smo prezentirali joÅ” jedan model, LDA, koji te vjerojatnosti tretira kao distribuciju koja ovisi o nekom parametru. Kao i pLSA, i LDA reprezentira dokumente kao mjeÅ”avinu latentnih tema, ali teme su sada distribucije riječi iz rječnika. Zato je bilo potrebno definirati neku distribuciju distribucija, gdje se prirodno nametnula Diricheltova distribucija. Na kraju smo ukratko prikazali modeliranje tema na kolekciji članaka iz Wikipedije.Nowadays, more and more important is to make a computer that performs tasks that man does routinely, as fast and efficiently. One of these tasks is finding a few documents from the given collection, that are most relevant for userā€™s query. The first step in solving this problem is representing the collection of documents as a term-document matrix, whose elements are tf-idf weights of words in the document. In this way, we represent each document as a vector in the space of terms. If the query is represented as a vector as well, standard similarity measures, such as a cosine similarity, can be used for comparison of the query and documents. In such space, synonyms will be orthogonal and polysemies will be presented with one vector, regardless of the context of the word. Motivated by this fact, and a large dimension of the term-document matrix, a lower rank approximation of the matrix is done. The approximation is gained using a singular value decomposition (SVD) of the matrix. We have shown that the approximation takes into account the context of the words. The query needs to be transformed into a new space as well, so it can be compared with vectors in this lower dimensional space. We showed how can we add new documents and terms in the case of a dynamic collection. While this method, solves the problem of synonyms to some extent, the problem with polysemies remains unsolved. In addition, LSA assumes that the data noise (gained from language variability) has a Gaussian distribution, which is not a natural assumption. The following method, pLSA, assumes that each document comes from a generative, probabilistic process, whose parameters we seek with maximization of likelihood. Each document is a mixture of latent concepts and we look for posterior probabilities of these concepts when observations are given. However, pLSA assumes these probabilities are parameters of model which leads to over-fitting of the model. Therefore, we present another model, LDA, that treats these probabilities as a distribution that depends on some parameter. Documents are, again, represented as a mixture of latent topics, but these topics are a distribution of words from the dictionary. Therefore, it is necessary to define a distribution of distributions and a natural choice is the Dirichelt distribution. Finally, we have briefly presented a topic modeling of the collection of articles from Wikipedia

    Latentna semantička analiza, varijante i primjene

    Get PDF
    U danaÅ”nje vrijeme sve viÅ”e težimo tome da omogućimo da računalo izvrÅ”ava zadatke, koje čovjek čini rutinski, jednako brzo i efikasno. Jedan od takvih zadataka je i pronalazak par dokumenata iz kolekcije koji su najrelevantniji za korisnikov upit. Prvi korak u rjeÅ”avanju tog problema je reprezentacija kolekcije dokumenata pojmovno-dokumentnom matricom, čiji elementi predstavljaju tf-idf težine riječi u dokumentu. Na taj način smo svaki dokument prikazali vektorom u prostoru pojmova. Ako i upit prikažemo vektorom, onda za usporedbu upita i dokumenta iz kolekcije, možemo iskoristiti standardne mjere sličnosti, poput kosinusne. U takvom prostoru, sinonimi će biti ortogonalni, a viÅ”eznačnice će biti predstavljene jednim vektorom, neovisno o kontekstu u kojem se riječ nalazi. Motivirani tom činjenicom i velikom dimenzijom pojmovno-dokumentne matrice, odlučili smo ju aproksimirati matricom nižeg ranga. Aproksimaciju je omogućila singularna dekompozicija matrice (SVD). Pokazali smo da aproksimacijom uzimamo u obzir kontekst u kojem se riječ nalazi. Kako bismo korisnikov upit mogli usporediti s vektorima dokumenata u novonastalom prostoru i njega transformiramo. Pokazali smo kako u slučaju dinamičke kolekcije možemo dodati nove dokumente i pojmove u već postojeći latentni prostor. Iako je opisana metoda, koju kraće zovemo LSA, donekle rijeÅ”ila problem sinonima, preostao je problem s viÅ”eznačnicama. Osim toga, LSA pretpostavlja da Å”um uzorka podataka (dobiven zbog jezične varijabilnosti) ima Gaussovu distribuciju, Å”to nije prirodna pretpostavka. Sljedećom metodom, pLSA, pretpostavili smo da svaki dokument dolazi iz nekog generativnog, vjerojatnosnog procesa čije parametre tražimo maksimizacijom izglednosti. Svaki dokument je mjeÅ”avina latentnih koncepata i tražimo posteriorne vjerojatnosti tih koncepata uz dana opažanja. Međutim, pLSA ih shvaća kao parametar modela, Å”to dovodi do prenaučenosti. Zato smo prezentirali joÅ” jedan model, LDA, koji te vjerojatnosti tretira kao distribuciju koja ovisi o nekom parametru. Kao i pLSA, i LDA reprezentira dokumente kao mjeÅ”avinu latentnih tema, ali teme su sada distribucije riječi iz rječnika. Zato je bilo potrebno definirati neku distribuciju distribucija, gdje se prirodno nametnula Diricheltova distribucija. Na kraju smo ukratko prikazali modeliranje tema na kolekciji članaka iz Wikipedije.Nowadays, more and more important is to make a computer that performs tasks that man does routinely, as fast and efficiently. One of these tasks is finding a few documents from the given collection, that are most relevant for userā€™s query. The first step in solving this problem is representing the collection of documents as a term-document matrix, whose elements are tf-idf weights of words in the document. In this way, we represent each document as a vector in the space of terms. If the query is represented as a vector as well, standard similarity measures, such as a cosine similarity, can be used for comparison of the query and documents. In such space, synonyms will be orthogonal and polysemies will be presented with one vector, regardless of the context of the word. Motivated by this fact, and a large dimension of the term-document matrix, a lower rank approximation of the matrix is done. The approximation is gained using a singular value decomposition (SVD) of the matrix. We have shown that the approximation takes into account the context of the words. The query needs to be transformed into a new space as well, so it can be compared with vectors in this lower dimensional space. We showed how can we add new documents and terms in the case of a dynamic collection. While this method, solves the problem of synonyms to some extent, the problem with polysemies remains unsolved. In addition, LSA assumes that the data noise (gained from language variability) has a Gaussian distribution, which is not a natural assumption. The following method, pLSA, assumes that each document comes from a generative, probabilistic process, whose parameters we seek with maximization of likelihood. Each document is a mixture of latent concepts and we look for posterior probabilities of these concepts when observations are given. However, pLSA assumes these probabilities are parameters of model which leads to over-fitting of the model. Therefore, we present another model, LDA, that treats these probabilities as a distribution that depends on some parameter. Documents are, again, represented as a mixture of latent topics, but these topics are a distribution of words from the dictionary. Therefore, it is necessary to define a distribution of distributions and a natural choice is the Dirichelt distribution. Finally, we have briefly presented a topic modeling of the collection of articles from Wikipedia
    corecore